WO 2007/109923 



PCT/CN2006/000545 



CLAIMS 

What is claimed is: 

1 . A method, comprising: 

interacting by a learning component of a server of a network with one 
or more clients and an environment of the network; 

conducting by the learning component different trials of one or more 
options in different states for network communication via a protocol of the 
network; 

receiving, by the learning component, performance feedback for the 
different trials as rewards; and 

utilizing by the learning component the different trials and associated 
resulting rewards to improve a decision-making policy associated with the 
server for negotiation of the one or more options. 

2. The method of claim 1, further comprising uploading by the learning 
component an optimum set of options based on the different trials and 
rewards and observed configurations of the environment associated with the 
optimum set of options to a centralized place. 

3. The method of claim 2, wherein one or more other servers download 
from the centralized place the optimum set of options to utilize as an initial 
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point to start a new learning process in the environment of the one or more 
other servers, 

4. The method of claim 1/ wherein the option negotiation component 
applies a reinforcement learning algorithm to improve the decision-making 
policy associated with the server for negotiation of the one or more options. 

5. The method of claim 4, wherein the reinforcement algorithm utilizes a 
Q4earning method. 

6. The method of claim 5, wherein the Q-learning algorithm iteratively 
calculates value functions of an optimal policy for option selection by the 
option negotiation component. 

7. The method of claim 1, wherein the option negotiation component is 
part of a trivial file transfer protocol (TFTP) server. 

8. An apparatus, comprising: 

an option negotiation component to select one or more options for a 
communication protocol, receive rewards as performance feedback 
associated with the selection of the one or more options, and adjust the 
selection of the one or more options based on the rewards; and 
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a file transfer component to transfer a file utilizing an optimum set of 
the one or more options selected by the option negotiation component based 
on the rewards and adjusted selections. 

9. The apparatus of claim 8, wherein the option negotiation component 
applies a reinforcement learning algorithm that determines the one or more 
options to select, the performance feedback for the selection, and the 
adjustment of the selection. 

10. The apparatus of claim 9, wherein the reinforcement algorithm 
utilizes a Q-learning algorithm. 

1 1 . The apparatus of claim 1 0, wherein the Q-learning algorithm 
iteratively calculates value functions of an optimal policy for option 
selection by the option negotiation component. 

12. The apparatus of claim 8, wherein the option negotiation component 
and the file transfer component are components of a trivial file transfer 
protocol (TFTP) server. 

13. The apparatus of claim 8, wherein the option selection component 
further to upload the optimum set of options and associated configurations 
of an environment associated with the optimum set of options to a 
centralized place. 
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14. The apparatus of claim 13, wherein one or more servers download the 
optimum set of options for an environment similar to the associated 
environment. 

15. A system, comprising: 

a network environment; and 

a server communicatively coupled to the network environment via a 
network interface and including: 

an option negotiation component to select one or more options 
for a communication protocol, receive rewards as performance 
feedback associated with the selection of the one or more options, and 
adjust the selection of the one or more options based on the rewards; 
and 

a file transfer component to transfer a file utilizing an optimum 
set of the one or more options selected by the option negotiation 
component based on the rewards and adjusted selections. 

16. The system of claim 15, wherein the option negotiation component 
applies a reinforcement learning algorithm that determines the one or more 
options to select, the performance feedback for the selection, and the 
adjustment of the selection. 
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17. The apparatus of claim 9, wherein the reinforcement algorithm 
utilizes a Q-learning algorithm. 

18. The apparatus of claim 10, wherein the Q-learning algorithm 
iteratively calculates value functions of an optimal policy for option 
selection by the option negotiation component. 

19. The system of claim 15, wherein the server is a trivial file transfer 
protocol (TFTP) server. 

20. The system of claim 15, wherein the option negotiation component 
uploads an optimum set of options based on the different trials and rewards 
and observed configurations of the environment associated with the 
optimum set of options to a centralized place. 
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